Question 1:

Section 1:

This is to certify that the work I am submitting is my own. All external references and sources are clearly acknowledged and identified within the contents. I am aware of the University of Warwick regulation concerning plagiarism and collusion.

No substantial part(s) of the work submitted here has also been submitted by me in other assessments for accredited courses of study, and I acknowledge that if this has been done an appropriate reduction in the mark I might otherwise have received will be made.

The data set is provided by the Agency of Food Standards. The data set contains the following variable:

Variable Discription
Country This is the Country of the Local Authority
La_type Type of Local Authority
La_name Name of Local Authority
Totalestablisments_includingnotyetrated_inside These are the total number of establishments in the area which also include those that have not been rated and are yet to be part of the programme
establishmentnotyetratedforintervention These are number of establishments that are to be rated for the intervention
establishmentoutsidetheprogramme These are the number of establishments which are not part of the programme yet
Total_percent_of_broadly_compliantestablishmentsrated_a_e These are total percentages of establishments which are rated from A to E and then are in general compliance
Total_percent_of_broadly_compliantestablishments_includingnotyetrated These are total percentages of establishments which are not rated are in general compliance with the regulations
Aratedestablishments These are establishments which are in the Area and are rated A(These have the most impact on the health of public)
Total_percent_of_broadly_compliantestablishments_a A rated establishments which are broadly compliant
Bratedestablishments These are the establishments which are in the AreaB
Total_percent_of_broadly_compliantestablishments_b These are total number of establishments which are rated B and are only broadly compliant
Cratedestablishments These are number of establishments which are only rated in the Area C
Total_percent_of_broadly_compliantestablishments_c These are only total percentages of establishments in the Area rated C and are broadly compliant
Dratedestablishments These are number of establishments which are only rated in the Area D
Total_percent_of_broadly_compliantestablishments_d These are only total percentages of establishments in the Area rated D and are broadly complaint
Eratedestablishments Number of establishments in the Area rated E
Total_percent_of_broadly_compliantestablishments_e These are only total percentages of establishments in the Area rated E and are broadly compliant
Total_percent_of_interventionsachieved_premisesrated_a_e) These are total percentage of premises rated A to E for all premises
Total_percent_of_interventionsachieved_premisesrated_a These are total percentage of premises rated A
Total_percent_of_interventionsachieved_premisesrated_b These are total percentage of premises rated B
Total_percent_of_interventionsachieved_premisesrated_c These are total percentage of premises rated C
Total_percent_of_interventionsachieved_premisesrated_d These are total percentage of premises rated D
Total_percent_of_interventionsachieved_premisesrated_e These are total percentage of premises rated E
Total_percent_of_interventionsachieved_premisesnotyetrated These are total percentage of interventions that are not yet rated
Totalnumberofestablishmentssubjecttoformalenforcementactions_voluntaryclosure These are total number of establishments which are subjected to formal enforcement actions such as voluntary closure
Totalnumberofestablishmentssubjecttoformalenforcementactions_seizure_detention_surrenderoffood These are total number of establishments which are subjected to formal enforcements like food surrender, seizures or Detention
Totalnumberofestablishmentssubjecttoformalenforcementactions_suspension_revocationofapprovalorlicence These are total number of establishments which are facing formal enforcement actions like revocation of their license approval or suspension
Totalnumberofestablishmentssubjecttoformalenforcementactions_hygieneemergencyprohibitionnotice These are total number of establishments which are subjected to formal enforcements actions like receiving a Hygeine Emergency Prohibition Notice
Totalnumberofestablishmentssubjecttoformalenforcementactions_prohibitionorder These are total number of establishments which are subjected to formal enforcements actions like receiving an order of prohibition
Totalnumberofestablishmentssubjecttoformalenforcementactions_simplecaution Thesea re total number of establishments which are subjected to formal enforcements actions like a simple caution
Totalnumberofestablishmentssubjecttoformalenforcementactions_hygieneimprovementnotices These are total number of establshments which are subjected to formal engagements like receiving hygiene improvements notices
Totalnumberofestablishmentssubjecttoformalenforcementactions_remedialaction_detentionnotices These are total number of establishments which are subjected to formal enforcements actions like detection notices or remedial action
Totalnumberofestablishmentssubjectto_writtenwarnings These are total number of establishments which are subjected to written form of warnings
Totalnumberofestablishmentssubjecttoformalenforcementactions_Prosecutionsconcluded Thesea re total number of establishments which are subjected to formal enforcements actions whcich are concluded prosecutions
Professional_full_time_equivalent_posts_occupied These are total number of professional full time positions which are currently occupied by the local authority

food_hygeine <- read_csv('2019-20-enforcement-data-food-hygiene.csv')
## Rows: 353 Columns: 36
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Country, LAType, LAName, Total%ofBroadlyCompliantestablishments-A,...
## dbl (30): Totalestablishments(includingnotyetrated&outside), Establishmentsn...
## num  (1): TotalnumberofestablishmentssubjecttoWrittenwarnings
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
food_hygeine <- food_hygeine %>% clean_names()

food_hygeine <- na.omit(food_hygeine)

food_hygeine$total_percent_of_interventionsachieved_premisesrated_a <- replace(food_hygeine$total_percent_of_interventionsachieved_premisesrated_a, food_hygeine$total_percent_of_interventionsachieved_premisesrated_a == 'NR', 0)

food_hygeine$total_percent_of_interventionsachieved_premisesrated_a = as.numeric(food_hygeine$total_percent_of_interventionsachieved_premisesrated_a)

food_hygeine$country = as.factor(food_hygeine$country)
food_hygeine$la_type = as.factor(food_hygeine$la_type)
food_hygeine$la_name = as.factor(food_hygeine$la_name)



#summary(food_hygeine)

#AS we can observe that in column interventions there are 24 NAs which achieve the premis rated A, and we shall replace them, and we should do that with the mean of interventions. 

avg_premisesrateda <- mean(food_hygeine$total_percent_of_interventionsachieved_premisesrated_a, na.rm = TRUE)

food_hygeine <- replace_na(food_hygeine, list(total_percent_of_interventionsachieved_premisesrated_a = avg_premisesrateda))
#x1 <- na.omit(food_hygeine)

1.1 Distribution of the % of enforcement actions successfully achieved

1.1.1 Distribution for all levels combined

ggplot(data = food_hygeine, aes(x= total_percent_of_interventionsachieved_premisesrated_a_e, position = 'identity', fill = country)) + geom_histogram(binwidth = 1) + 
  labs(x="Percentage %", y = "Count", title = "Distrbution of Successful Enforcement Action Percentage for All Levels A-E")

#### 1.1.2 Distribution for level A-E seperately

##GGplot for all different premis rated, premis rated from A to E
ggplot(data = food_hygeine, aes(x=total_percent_of_interventionsachieved_premisesrated_a)) +
  geom_histogram(binwidth = 1,colour="black", fill="orange") + 
  labs(x = 'Interventions achieved A(in %)', y = 'Count', title = "Distrbution of Successful Enforcement Action Percentage for Level A") + 
  ylim(0,400) +
  scale_y_continuous( limit=c(0,400),breaks = seq(0,400,25),expand = c(0,0))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

ggplot(data = food_hygeine, aes(x=total_percent_of_interventionsachieved_premisesrated_b)) + geom_histogram(binwidth = 1,colour="black", fill="orange") + labs(x = 'Interventions Achieved B(in %)', y = 'Count', title = "Distrbution of Successful Enforcement Action Percentage for Level B") +
   ylim(0,350) +
  scale_y_continuous( breaks = seq(0,350,25),expand = c(0,0))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

ggplot(data = food_hygeine, aes(x=total_percent_of_interventionsachieved_premisesrated_c)) + geom_histogram(binwidth = 1,colour="black", fill="orange") + labs(x = 'Interventions Achieved C(in %)', y = 'Count', title = "Distrbution of Successful Enforcement Action Percentage for Level C") +
 ylim(0,350) +
  scale_y_continuous( breaks = seq(0,200,25),expand = c(0,0))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

ggplot(data = food_hygeine, aes(x=total_percent_of_interventionsachieved_premisesrated_d)) + geom_histogram(binwidth = 1,colour="black", fill="orange") + labs(x = 'Interventions Achieved D(in %)', y = 'count', title = "Distrbution of Successful Enforcement Action Percentage for Level D") + 
 ylim(0,350) +
  scale_y_continuous( breaks = seq(0,350,25),expand = c(0,0))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

ggplot(data = food_hygeine, aes(x=total_percent_of_interventionsachieved_premisesrated_e)) + geom_histogram(binwidth = 1,colour="black", fill="orange") + labs(x = 'Interventions Achieved E(in %)', y = 'count', title = "Distrbution of Successful Enforcement Action Percentage for Level E") +
   ylim(0,350) +
  scale_y_continuous( breaks = seq(0,350,25),expand = c(0,0))
## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

1.2 Relationship between proportion of successful responses and the number of employees

food_hygeine_correlation <- rcorr(as.matrix(select(food_hygeine, total_percent_of_interventionsachieved_premisesrated_e,total_percent_of_interventionsachieved_premisesrated_d,total_percent_of_interventionsachieved_premisesrated_c,total_percent_of_interventionsachieved_premisesrated_b,total_percent_of_interventionsachieved_premisesrated_a,total_percent_of_interventionsachieved_premisesrated_a_e,professional_full_time_equivalent_posts_occupied)), type = "spearman")

food_hygeine_correlation
##                                                          total_percent_of_interventionsachieved_premisesrated_e
## total_percent_of_interventionsachieved_premisesrated_e                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.55
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.38
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.28
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.05
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.81
## professional_full_time_equivalent_posts_occupied                                                           0.03
##                                                          total_percent_of_interventionsachieved_premisesrated_d
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.55
## total_percent_of_interventionsachieved_premisesrated_d                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.71
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.45
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.03
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.85
## professional_full_time_equivalent_posts_occupied                                                          -0.09
##                                                          total_percent_of_interventionsachieved_premisesrated_c
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.38
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.71
## total_percent_of_interventionsachieved_premisesrated_c                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.47
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.07
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.69
## professional_full_time_equivalent_posts_occupied                                                          -0.05
##                                                          total_percent_of_interventionsachieved_premisesrated_b
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.28
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.45
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.47
## total_percent_of_interventionsachieved_premisesrated_b                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.11
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.45
## professional_full_time_equivalent_posts_occupied                                                           0.04
##                                                          total_percent_of_interventionsachieved_premisesrated_a
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.05
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.03
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.07
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.11
## total_percent_of_interventionsachieved_premisesrated_a                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.03
## professional_full_time_equivalent_posts_occupied                                                           0.12
##                                                          total_percent_of_interventionsachieved_premisesrated_a_e
## total_percent_of_interventionsachieved_premisesrated_e                                                       0.81
## total_percent_of_interventionsachieved_premisesrated_d                                                       0.85
## total_percent_of_interventionsachieved_premisesrated_c                                                       0.69
## total_percent_of_interventionsachieved_premisesrated_b                                                       0.45
## total_percent_of_interventionsachieved_premisesrated_a                                                       0.03
## total_percent_of_interventionsachieved_premisesrated_a_e                                                     1.00
## professional_full_time_equivalent_posts_occupied                                                             0.00
##                                                          professional_full_time_equivalent_posts_occupied
## total_percent_of_interventionsachieved_premisesrated_e                                               0.03
## total_percent_of_interventionsachieved_premisesrated_d                                              -0.09
## total_percent_of_interventionsachieved_premisesrated_c                                              -0.05
## total_percent_of_interventionsachieved_premisesrated_b                                               0.04
## total_percent_of_interventionsachieved_premisesrated_a                                               0.12
## total_percent_of_interventionsachieved_premisesrated_a_e                                             0.00
## professional_full_time_equivalent_posts_occupied                                                     1.00
## 
## n= 347 
## 
## 
## P
##                                                          total_percent_of_interventionsachieved_premisesrated_e
## total_percent_of_interventionsachieved_premisesrated_e                                                         
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.3941                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.6158                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_d
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d                                                         
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.5895                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.0852                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_c
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c                                                         
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.2053                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.3737                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_b
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b                                                         
## total_percent_of_interventionsachieved_premisesrated_a   0.0505                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.4371                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_a
## total_percent_of_interventionsachieved_premisesrated_e   0.3941                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.5895                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.2053                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0505                                                
## total_percent_of_interventionsachieved_premisesrated_a                                                         
## total_percent_of_interventionsachieved_premisesrated_a_e 0.5900                                                
## professional_full_time_equivalent_posts_occupied         0.0211                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_a_e
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_a   0.5900                                                  
## total_percent_of_interventionsachieved_premisesrated_a_e                                                         
## professional_full_time_equivalent_posts_occupied         0.9771                                                  
##                                                          professional_full_time_equivalent_posts_occupied
## total_percent_of_interventionsachieved_premisesrated_e   0.6158                                          
## total_percent_of_interventionsachieved_premisesrated_d   0.0852                                          
## total_percent_of_interventionsachieved_premisesrated_c   0.3737                                          
## total_percent_of_interventionsachieved_premisesrated_b   0.4371                                          
## total_percent_of_interventionsachieved_premisesrated_a   0.0211                                          
## total_percent_of_interventionsachieved_premisesrated_a_e 0.9771                                          
## professional_full_time_equivalent_posts_occupied
##Linear regression model to see the relationship between number of employees and all successful premis enforcement achieved 
ggplot(food_hygeine, aes(y=total_percent_of_interventionsachieved_premisesrated_a_e, x=professional_full_time_equivalent_posts_occupied)) + geom_point()+ labs(x= "No. of Employees", y="All Successful Premises Enforcement Achieved (in percentage %)")+ geom_smooth(method=lm) + geom_jitter()

FullTimeEmployeesAtoE <- lm(total_percent_of_interventionsachieved_premisesrated_a_e ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

FullTimeEmployeesAtoE
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_a_e ~ 
##     professional_full_time_equivalent_posts_occupied, data = food_hygeine)
## 
## Coefficients:
##                                      (Intercept)  
##                                          87.1091  
## professional_full_time_equivalent_posts_occupied  
##                                          -0.1195
summary(FullTimeEmployeesAtoE)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_a_e ~ 
##     professional_full_time_equivalent_posts_occupied, data = food_hygeine)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.304  -4.575   4.067   8.658  13.860 
## 
## Coefficients:
##                                                  Estimate Std. Error t value
## (Intercept)                                       87.1091     1.2828  67.905
## professional_full_time_equivalent_posts_occupied  -0.1195     0.2675  -0.447
##                                                  Pr(>|t|)    
## (Intercept)                                        <2e-16 ***
## professional_full_time_equivalent_posts_occupied    0.655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.4 on 345 degrees of freedom
## Multiple R-squared:  0.0005787,  Adjusted R-squared:  -0.002318 
## F-statistic: 0.1998 on 1 and 345 DF,  p-value: 0.6552
cbind(coeffcient=coef(FullTimeEmployeesAtoE), confint(FullTimeEmployeesAtoE))
##                                                  coeffcient      2.5 %
## (Intercept)                                      87.1091495 84.5860343
## professional_full_time_equivalent_posts_occupied -0.1195469 -0.6456029
##                                                      97.5 %
## (Intercept)                                      89.6322647
## professional_full_time_equivalent_posts_occupied  0.4065092
FullTimeEmployeesA <- lm(total_percent_of_interventionsachieved_premisesrated_a ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

FullTimeEmployeesB <- lm(total_percent_of_interventionsachieved_premisesrated_b ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

FullTimeEmployeesC <- lm(total_percent_of_interventionsachieved_premisesrated_c ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

FullTimeEmployeesD <- lm(total_percent_of_interventionsachieved_premisesrated_d ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

FullTimeEmployeesE <- lm(total_percent_of_interventionsachieved_premisesrated_e ~ professional_full_time_equivalent_posts_occupied, data = food_hygeine)

##Adjusted R-squared:  -0.002318 
##F-statistic: 0.1998 on 1 and 345 DF,  p-value: 0.6552

1.3 Relationship between proportion of successful responses and the number of employees as a proportion of the number of establishments

food_hygeine_new <- food_hygeine %>% mutate(total_rated_establishments = (totalestablishments_includingnotyetrated_outside - establishmentsnotyetratedforintervention - establishmentsoutsidetheprogramme), Employees_Proportion = round((professional_full_time_equivalent_posts_occupied /total_rated_establishments)*100,2))
FTemployeesAtoE <- lm(total_percent_of_interventionsachieved_premisesrated_a_e ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesAtoE )
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_a_e ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -61.045  -5.352   4.403   8.379  15.901 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            79.512      1.932  41.149  < 2e-16 ***
## Employees_Proportion   24.142      6.180   3.907 0.000113 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.13 on 345 degrees of freedom
## Multiple R-squared:  0.04236,    Adjusted R-squared:  0.03959 
## F-statistic: 15.26 on 1 and 345 DF,  p-value: 0.0001126
cbind(coeffcient=coef(FTemployeesAtoE ), confint(FTemployeesAtoE))
##                      coeffcient    2.5 %   97.5 %
## (Intercept)            79.51215 75.71161 83.31269
## Employees_Proportion   24.14159 11.98709 36.29608
##Adjusted R-squared:  0.03959 
##F-statistic: 15.26 on 1 and 345 DF,  p-value: 0.0001126

FTemployeesforA <- lm(total_percent_of_interventionsachieved_premisesrated_a ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesforA)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_a ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -91.482   8.411   8.696   8.802   9.105 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            90.771      4.138  21.935   <2e-16 ***
## Employees_Proportion    1.778     13.234   0.134    0.893    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.99 on 345 degrees of freedom
## Multiple R-squared:  5.232e-05,  Adjusted R-squared:  -0.002846 
## F-statistic: 0.01805 on 1 and 345 DF,  p-value: 0.8932
cbind(coeffcient=coef(FTemployeesforA), confint(FTemployeesforA))
##                      coeffcient     2.5 %   97.5 %
## (Intercept)           90.770882  82.63173 98.91003
## Employees_Proportion   1.778014 -24.25176 27.80779
##Adjusted R-squared:  -0.002846 
##F-statistic: 0.01805 on 1 and 345 DF,  p-value: 0.8932

FTemployeesforB <- lm(total_percent_of_interventionsachieved_premisesrated_b ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesforB)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_b ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -45.281  -1.647   2.574   4.616   4.938 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            94.893      1.089  87.175   <2e-16 ***
## Employees_Proportion    1.213      3.481   0.348    0.728    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.836 on 345 degrees of freedom
## Multiple R-squared:  0.0003518,  Adjusted R-squared:  -0.002546 
## F-statistic: 0.1214 on 1 and 345 DF,  p-value: 0.7277
cbind(coeffcient=coef(FTemployeesforB), confint(FTemployeesforB))
##                      coeffcient     2.5 %    97.5 %
## (Intercept)           94.892512 92.751533 97.033491
## Employees_Proportion   1.213003 -5.634052  8.060059
##Adjusted R-squared:  -0.002546 
##F-statistic: 0.1214 on 1 and 345 DF,  p-value: 0.7277

FTemployeesforC <- lm(total_percent_of_interventionsachieved_premisesrated_c ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesforC)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_c ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -71.059  -2.712   3.239   5.865  10.098 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            88.365      1.464  60.366   <2e-16 ***
## Employees_Proportion   11.814      4.681   2.524   0.0121 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.193 on 345 degrees of freedom
## Multiple R-squared:  0.01812,    Adjusted R-squared:  0.01528 
## F-statistic: 6.368 on 1 and 345 DF,  p-value: 0.01207
cbind(coeffcient=coef(FTemployeesforC), confint(FTemployeesforC))
##                      coeffcient     2.5 %   97.5 %
## (Intercept)            88.36538 85.486223 91.24454
## Employees_Proportion   11.81402  2.606194 21.02185
##Adjusted R-squared:  0.01528 
##F-statistic: 6.368 on 1 and 345 DF,  p-value: 0.01207

FTemployeesforD <- lm(total_percent_of_interventionsachieved_premisesrated_d ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesforD)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_d ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -67.764  -4.461   5.055   9.230  15.528 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            81.235      2.283  35.582   <2e-16 ***
## Employees_Proportion   17.281      7.301   2.367   0.0185 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.34 on 345 degrees of freedom
## Multiple R-squared:  0.01598,    Adjusted R-squared:  0.01312 
## F-statistic: 5.602 on 1 and 345 DF,  p-value: 0.0185
cbind(coeffcient=coef(FTemployeesforD), confint(FTemployeesforD))
##                      coeffcient     2.5 %   97.5 %
## (Intercept)            81.23538 76.744986 85.72578
## Employees_Proportion   17.28058  2.919861 31.64130
##Adjusted R-squared:  0.01312 
##F-statistic: 5.602 on 1 and 345 DF,  p-value: 0.0185

FTemployeesforE <- lm(total_percent_of_interventionsachieved_premisesrated_e ~ Employees_Proportion, data = food_hygeine_new)

summary(FTemployeesforE)
## 
## Call:
## lm(formula = total_percent_of_interventionsachieved_premisesrated_e ~ 
##     Employees_Proportion, data = food_hygeine_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -76.912 -12.654   8.201  17.710  29.396 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            64.215      3.764  17.061  < 2e-16 ***
## Employees_Proportion   44.679     12.037   3.712  0.00024 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.64 on 345 degrees of freedom
## Multiple R-squared:  0.0384, Adjusted R-squared:  0.03561 
## F-statistic: 13.78 on 1 and 345 DF,  p-value: 0.0002398
cbind(coeffcient=coef(FTemployeesforE), confint(FTemployeesforE))
##                      coeffcient    2.5 %   97.5 %
## (Intercept)            64.21526 56.81236 71.61816
## Employees_Proportion   44.67894 21.00378 68.35411
##Adjusted R-squared:  0.03561 
##F-statistic: 13.78 on 1 and 345 DF,  p-value: 0.0002398
food_hygeine$total_percent_of_interventionsachieved_premisesrated_a = as.numeric(food_hygeine$total_percent_of_interventionsachieved_premisesrated_a)

str(food_hygeine)
## tibble [347 × 36] (S3: tbl_df/tbl/data.frame)
##  $ country                                                                                              : Factor w/ 3 levels "England","Northern Ireland",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ la_type                                                                                              : Factor w/ 6 levels "District Council",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ la_name                                                                                              : Factor w/ 347 levels "Adur and Worthing",..: 1 2 3 8 9 10 11 12 16 17 ...
##  $ totalestablishments_includingnotyetrated_outside                                                     : num [1:347] 1478 1316 1112 1208 905 ...
##  $ establishmentsnotyetratedforintervention                                                             : num [1:347] 24 29 1 44 26 0 58 40 41 84 ...
##  $ establishmentsoutsidetheprogramme                                                                    : num [1:347] 0 74 0 1 1 0 214 39 0 42 ...
##  $ total_percent_of_broadly_compliantestablishmentsrated_a_e                                            : num [1:347] 97.2 97.2 97.5 97.7 96.7 ...
##  $ total_percent_of_broadly_compliantestablishments_includingnotyetrated                                : num [1:347] 95.6 94.9 97.4 94.1 93.9 ...
##  $ aratedestablishments                                                                                 : num [1:347] 3 2 2 3 1 5 1 4 1 4 ...
##  $ total_percent_of_broadly_compliantestablishments_a                                                   : chr [1:347] "33.33" "50" "50" "0" ...
##  $ bratedestablishments                                                                                 : num [1:347] 39 26 39 28 31 15 20 44 31 36 ...
##  $ total_percent_of_broadly_compliantestablishments_b                                                   : num [1:347] 69.2 76.9 64.1 82.1 77.4 ...
##  $ cratedestablishments                                                                                 : num [1:347] 227 243 179 211 145 125 270 219 96 190 ...
##  $ total_percent_of_broadly_compliantestablishments_c                                                   : num [1:347] 91.2 90.1 93.8 94.3 89.7 ...
##  $ dratedestablishments                                                                                 : num [1:347] 592 469 432 483 353 453 555 626 186 519 ...
##  $ total_percent_of_broadly_compliantestablishments_d                                                   : num [1:347] 99 99.4 99.5 98.5 98.3 ...
##  $ eratedestablishments                                                                                 : num [1:347] 593 473 459 438 348 534 628 1030 219 525 ...
##  $ total_percent_of_broadly_compliantestablishments_e                                                   : num [1:347] 99.8 100 100 100 100 ...
##  $ total_percent_of_interventionsachieved_premisesrated_a_e                                             : num [1:347] 96.1 90.6 88.9 94 80.7 ...
##  $ total_percent_of_interventionsachieved_premisesrated_a                                               : num [1:347] 100 100 100 100 60 80 100 100 100 100 ...
##  $ total_percent_of_interventionsachieved_premisesrated_b                                               : num [1:347] 100 98.3 95.1 96.3 100 ...
##  $ total_percent_of_interventionsachieved_premisesrated_c                                               : num [1:347] 95.5 89.7 97 94.4 78.8 ...
##  $ total_percent_of_interventionsachieved_premisesrated_d                                               : num [1:347] 96 93 91.8 92.6 85.3 ...
##  $ total_percent_of_interventionsachieved_premisesrated_e                                               : num [1:347] 94 85.1 72.3 95.5 68.3 ...
##  $ total_percent_of_interventionsachieved_premisesnotyetrated                                           : num [1:347] 100 100 100 95.4 79.6 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_voluntaryclosure                        : num [1:347] 5 0 0 2 1 0 0 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_seizure_detention_surrenderoffood       : num [1:347] 4 0 0 0 0 0 0 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_suspension_revocationofapprovalorlicence: num [1:347] 0 0 0 0 0 0 1 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_hygieneemergencyprohibitionnotice       : num [1:347] 0 0 0 0 0 0 0 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_prohibitionorder                        : num [1:347] 0 0 0 0 0 0 1 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_simplecaution                           : num [1:347] 0 0 1 0 0 0 0 0 0 0 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_hygieneimprovementnotices               : num [1:347] 3 6 11 3 4 0 3 2 1 2 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_remedialaction_detentionnotices         : num [1:347] 0 0 0 0 0 0 0 0 0 0 ...
##  $ totalnumberofestablishmentssubjectto_writtenwarnings                                                 : num [1:347] 323 413 515 386 252 224 223 152 179 175 ...
##  $ totalnumberofestablishmentssubjecttoformalenforcementactions_prosecutionsconcluded                   : num [1:347] 0 0 1 0 0 0 0 2 0 0 ...
##  $ professional_full_time_equivalent_posts_occupied                                                     : num [1:347] 5 4 3.5 4 2 4.65 2.5 5 2 4.2 ...
##  - attr(*, "na.action")= 'omit' Named int [1:6] 21 36 52 88 163 261
##   ..- attr(*, "names")= chr [1:6] "21" "36" "52" "88" ...

Question 1 Section 2:

Part 1: Distribution of successful respond percentage

We can see in this Figure that we have shown dispersion of all interventions across all establishments from rated A to rated E in three regions which are England, Northern Ireland and Wales. It is visibily clear from the figure that the enforcement of local authorities lies in the range of 90-100% which tells us that the efficiency of our local authorities is really high.

And we have also plotted graph from A rating to E rating, in which we show separate histograms and what is the efficiency of local authorities, and as we can establish from the graphs above that the success rate of local authorities and its intervention is highest for establishments which have rating A and then followed with B, C, D and E. We can see that all of the graphs have peak on their right, which means more than 75% of the local authorities are successful in implementing the interventions.

##                                                          total_percent_of_interventionsachieved_premisesrated_e
## total_percent_of_interventionsachieved_premisesrated_e                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.55
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.38
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.28
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.05
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.81
## professional_full_time_equivalent_posts_occupied                                                           0.03
##                                                          total_percent_of_interventionsachieved_premisesrated_d
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.55
## total_percent_of_interventionsachieved_premisesrated_d                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.71
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.45
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.03
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.85
## professional_full_time_equivalent_posts_occupied                                                          -0.09
##                                                          total_percent_of_interventionsachieved_premisesrated_c
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.38
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.71
## total_percent_of_interventionsachieved_premisesrated_c                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.47
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.07
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.69
## professional_full_time_equivalent_posts_occupied                                                          -0.05
##                                                          total_percent_of_interventionsachieved_premisesrated_b
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.28
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.45
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.47
## total_percent_of_interventionsachieved_premisesrated_b                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_a                                                     0.11
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.45
## professional_full_time_equivalent_posts_occupied                                                           0.04
##                                                          total_percent_of_interventionsachieved_premisesrated_a
## total_percent_of_interventionsachieved_premisesrated_e                                                     0.05
## total_percent_of_interventionsachieved_premisesrated_d                                                     0.03
## total_percent_of_interventionsachieved_premisesrated_c                                                     0.07
## total_percent_of_interventionsachieved_premisesrated_b                                                     0.11
## total_percent_of_interventionsachieved_premisesrated_a                                                     1.00
## total_percent_of_interventionsachieved_premisesrated_a_e                                                   0.03
## professional_full_time_equivalent_posts_occupied                                                           0.12
##                                                          total_percent_of_interventionsachieved_premisesrated_a_e
## total_percent_of_interventionsachieved_premisesrated_e                                                       0.81
## total_percent_of_interventionsachieved_premisesrated_d                                                       0.85
## total_percent_of_interventionsachieved_premisesrated_c                                                       0.69
## total_percent_of_interventionsachieved_premisesrated_b                                                       0.45
## total_percent_of_interventionsachieved_premisesrated_a                                                       0.03
## total_percent_of_interventionsachieved_premisesrated_a_e                                                     1.00
## professional_full_time_equivalent_posts_occupied                                                             0.00
##                                                          professional_full_time_equivalent_posts_occupied
## total_percent_of_interventionsachieved_premisesrated_e                                               0.03
## total_percent_of_interventionsachieved_premisesrated_d                                              -0.09
## total_percent_of_interventionsachieved_premisesrated_c                                              -0.05
## total_percent_of_interventionsachieved_premisesrated_b                                               0.04
## total_percent_of_interventionsachieved_premisesrated_a                                               0.12
## total_percent_of_interventionsachieved_premisesrated_a_e                                             0.00
## professional_full_time_equivalent_posts_occupied                                                     1.00
## 
## n= 347 
## 
## 
## P
##                                                          total_percent_of_interventionsachieved_premisesrated_e
## total_percent_of_interventionsachieved_premisesrated_e                                                         
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.3941                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.6158                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_d
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d                                                         
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.5895                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.0852                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_c
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c                                                         
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_a   0.2053                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.3737                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_b
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                
## total_percent_of_interventionsachieved_premisesrated_b                                                         
## total_percent_of_interventionsachieved_premisesrated_a   0.0505                                                
## total_percent_of_interventionsachieved_premisesrated_a_e 0.0000                                                
## professional_full_time_equivalent_posts_occupied         0.4371                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_a
## total_percent_of_interventionsachieved_premisesrated_e   0.3941                                                
## total_percent_of_interventionsachieved_premisesrated_d   0.5895                                                
## total_percent_of_interventionsachieved_premisesrated_c   0.2053                                                
## total_percent_of_interventionsachieved_premisesrated_b   0.0505                                                
## total_percent_of_interventionsachieved_premisesrated_a                                                         
## total_percent_of_interventionsachieved_premisesrated_a_e 0.5900                                                
## professional_full_time_equivalent_posts_occupied         0.0211                                                
##                                                          total_percent_of_interventionsachieved_premisesrated_a_e
## total_percent_of_interventionsachieved_premisesrated_e   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_d   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_c   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_b   0.0000                                                  
## total_percent_of_interventionsachieved_premisesrated_a   0.5900                                                  
## total_percent_of_interventionsachieved_premisesrated_a_e                                                         
## professional_full_time_equivalent_posts_occupied         0.9771                                                  
##                                                          professional_full_time_equivalent_posts_occupied
## total_percent_of_interventionsachieved_premisesrated_e   0.6158                                          
## total_percent_of_interventionsachieved_premisesrated_d   0.0852                                          
## total_percent_of_interventionsachieved_premisesrated_c   0.3737                                          
## total_percent_of_interventionsachieved_premisesrated_b   0.4371                                          
## total_percent_of_interventionsachieved_premisesrated_a   0.0211                                          
## total_percent_of_interventionsachieved_premisesrated_a_e 0.9771                                          
## professional_full_time_equivalent_posts_occupied

Part 2: Relationship between proportion of successful responses and the number of employees

This tells us the relationship between the overall successful interventions of establishments which ratings from A to E. This is a scatter plot and what we understand, and what we can derive from it is that there is a weak relationship, which is strongly inverse, there is no significant link between the two variables. And we have also taken in consideration certain statistical values to ensure that there is not a significant strong relationship. Thus, we can conclude that hiring more employees or professionals has any major impact on the success of interventions.

Part 3: Relationship between proportion of successful responses and the number of employees over establishments

For Objective 3 we have used certain statistical measure. The statistical measure that we have used is called r, which is correlation, and co relation is that how are two variables linked or related to each other, and if the r value is from 0 to -1 there is a negative relation, if the r value is between 0 to 1, we can say that there is a postive corelation, here we can see that the correlation is 0.23, and another statistical measure which tells us if the correlation is significant or not is p value, which here is 0.0001126 we can say the correlation is significant, and we can with this conclusion say that the number of employees should be increased so that the success rate of interventions in establishments of local authorities could be improved.

title: “Question 2 Section 1” author: “u2288495” date: “2023-12-14” output: html_document

Question 2:

Section 1:

** Data Dictionary **

This dataset is provided by a Publishing Company. The variables in the dataset are as follows:

Variable Description
sold by This is the name of the publisher who is selling the book
publisher_type This is the type of the publisher who is selling the book
genre This is the genre of the book
avg_review This is the average review which is given to by the readers to the book
daily_sales These are average number of sales minus refunds over the entirety of this specific period
total_reviews This column contains the total number of reviews which are given by the readers in the book
sale_price These is the total average price which is sold over this specific course of period

salesofbooks <- read_csv('publisher_sales.csv')
## Rows: 6000 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sold by, publisher.type, genre
## dbl (4): avg.review, daily.sales, total.reviews, sale.price
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
salesofbooks <- salesofbooks %>% clean_names()
#summary(salesofbooks)

salesofbooks$genre <- as.factor(salesofbooks$genre)
mean.dailySales.genre <- salesofbooks %>% group_by(genre) %>% summarise(daily_sales=mean(daily_sales))
mean.dailySales.genre
##   daily_sales
## 1    79.10967
m.dailySales.by.genre <- lm(daily_sales~genre, data=salesofbooks)
anova(m.dailySales.by.genre)
## Analysis of Variance Table
## 
## Response: daily_sales
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## genre        2 2562528 1281264  2590.5 < 2.2e-16 ***
## Residuals 5997 2966133     495                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m.dailySales.by.genre)
## 
## Call:
## lm(formula = daily_sales ~ genre, data = salesofbooks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.396  -13.326   -0.076   13.249  102.094 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       55.5773     0.4973  111.76   <2e-16 ***
## genrefiction      50.3087     0.7033   71.53   <2e-16 ***
## genrenon_fiction  20.2886     0.7033   28.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.24 on 5997 degrees of freedom
## Multiple R-squared:  0.4635, Adjusted R-squared:  0.4633 
## F-statistic:  2590 on 2 and 5997 DF,  p-value: < 2.2e-16
(  m.dailySales.by.genre.emm <- emmeans(m.dailySales.by.genre, ~genre)  )
##  genre       emmean    SE   df lower.CL upper.CL
##  childrens     55.6 0.497 5997     54.6     56.6
##  fiction      105.9 0.497 5997    104.9    106.9
##  non_fiction   75.9 0.497 5997     74.9     76.8
## 
## Confidence level used: 0.95
(  m.dailySales.by.genre.contruct <- confint(pairs(m.dailySales.by.genre.emm))  )
##  contrast                estimate    SE   df lower.CL upper.CL
##  childrens - fiction        -50.3 0.703 5997    -52.0    -48.7
##  childrens - non_fiction    -20.3 0.703 5997    -21.9    -18.6
##  fiction - non_fiction       30.0 0.703 5997     28.4     31.7
## 
## Confidence level used: 0.95 
## Conf-level adjustment: tukey method for comparing a family of 3 estimates

2.1 Do books from different genres have different daily sales on average?

#Plot a CI for a difference
grid.arrange(
  ci<-
    ggplot(summary(m.dailySales.by.genre.emm), aes(x=genre, y=emmean, ymin=lower.CL, ymax=upper.CL)) 
  + geom_point() 
  + geom_linerange() 
  + labs(y="Daily Sales", x="Genre", subtitle="Error bars are 95% CIs", title="Daily Sales") 
  + ylim(50,110) 
  + coord_flip(),
  
  d.ci<-
    ggplot(m.dailySales.by.genre.contruct, aes(x=contrast, y=estimate, ymin=lower.CL, ymax=upper.CL)) 
  + geom_point() 
  + geom_linerange() 
  + labs(y="Difference in Daily Sales", x="Contrast", subtitle="Error bars are 95% CIs", title="Difference in Daily Sales") 
  + ylim(-55,35)
  + coord_flip(),
  nrow=2
)

ggplot(data = salesofbooks, aes(x = daily_sales, color = genre)) + geom_histogram(binwidth = 1) + xlim(-10,200) + labs(title = "Daily Sales aganst Genre of Books")

##To check what is the daily sales of books against which type of genres and we can see here children genre out numbers fiction and non fiction
ggplot(data = salesofbooks, aes(x = sale_price)) + geom_histogram(binwidth = 0.2) + labs(title = "Distribution of Count against Sale Price")

ggplot(data = salesofbooks, aes(x = avg_review, fill = sold_by)) + geom_histogram(binwidth = 0.1) + labs(title = "Distribution of Avg Review against number of Books of seller categories")

##To check the distrubution of average reviews of different seller categories to see how skewed the distribution is or not 
ggplot(data = salesofbooks, aes(x = total_reviews)) + geom_histogram(binwidth = 2)

##To see the distrbution of total reviews, and we can see there are 0 values which are 24 and has been shown in the graph
summarytable <- salesofbooks %>% group_by(genre) %>% summarise_at(vars(daily_sales),              
               list(mean_sales = mean)) 
ggplot(summarytable, aes(y = genre, x = mean_sales)) + 
  geom_col()

##mean of sales against genre to see and assess the relationship

2.2 Do books have more/fewer sales depending upon their average review scores and total number of reviews?

#ggplot(salesofbooks, aes(x= avg_review, y = daily_sales)) + geom_point() + geom_smooth() + labs(title = "Relation of Daily Sales against Average Review")
## Plotting ggplot and finding out that there are 0 Average Review values 
##filtering avg reviews values which are equal to zero
p <- salesofbooks %>%
  filter(avg_review != 0)


ggplot(p, aes(x= avg_review, y = daily_sales)) + geom_point() + geom_smooth()  + labs(title = "Daily Sales against Genre of Books", subtitle = "With 0 avg reviews removed")

##Added a filter and made a ggplot of avg_review against daily_sales with 0 reviews removed 
cor(salesofbooks$avg_review,salesofbooks$daily_sales, method = "spearman" )
## [1] 0.004354448
##ggplot(salesofbooks, aes(x= total_reviews, y = daily_sales)) + geom_point() + geom_smooth(method = lm) + labs(title = "Daily Sales against Total Reviews")
ggplot(p, aes(x= total_reviews, y = daily_sales)) + geom_point() + geom_smooth(method = lm) + ylim(0, 270) + labs(title = "Daily Sales against Total Reviews", subtitle = "with 0 reviews removed")

##Added a filter and made a ggplot of total_reviews against daily_sales with 0 reviews removed 
cor(salesofbooks$total_reviews,salesofbooks$daily_sales, method = "spearman" )
## [1] 0.678407
ggplot(p, aes(x= avg_review, y = total_reviews)) + geom_point() + geom_smooth(method = lm) + ylim(0, 260) + labs(title = "Total Reviews against Average Review")

rcorr(as.matrix(salesofbooks %>% select(avg_review, daily_sales, total_reviews, sale_price)))
##               avg_review daily_sales total_reviews sale_price
## avg_review          1.00        0.00          0.10      -0.02
## daily_sales         0.00        1.00          0.66      -0.28
## total_reviews       0.10        0.66          1.00      -0.26
## sale_price         -0.02       -0.28         -0.26       1.00
## 
## n= 6000 
## 
## 
## P
##               avg_review daily_sales total_reviews sale_price
## avg_review               0.7474      0.0000        0.2347    
## daily_sales   0.7474                 0.0000        0.0000    
## total_reviews 0.0000     0.0000                    0.0000    
## sale_price    0.2347     0.0000      0.0000

2.3 What is the effect of sale price upon the number of sales, and is this different across genres?

ggplot(salesofbooks, aes(x = daily_sales)) + geom_boxplot() + labs(x="daily sales", y="frequency") + facet_grid(facets =  "genre", col = T)

m.daily.sales.by.genre <- lm(daily_sales ~ genre, data = salesofbooks)
 
(m.daily.sales.by.genre.emm <- emmeans(m.daily.sales.by.genre, ~genre))
##  genre       emmean    SE   df lower.CL upper.CL
##  childrens     55.6 0.497 5997     54.6     56.6
##  fiction      105.9 0.497 5997    104.9    106.9
##  non_fiction   75.9 0.497 5997     74.9     76.8
## 
## Confidence level used: 0.95
summary(m.daily.sales.by.genre)
## 
## Call:
## lm(formula = daily_sales ~ genre, data = salesofbooks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.396  -13.326   -0.076   13.249  102.094 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       55.5773     0.4973  111.76   <2e-16 ***
## genrefiction      50.3087     0.7033   71.53   <2e-16 ***
## genrenon_fiction  20.2886     0.7033   28.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.24 on 5997 degrees of freedom
## Multiple R-squared:  0.4635, Adjusted R-squared:  0.4633 
## F-statistic:  2590 on 2 and 5997 DF,  p-value: < 2.2e-16
m.sales_review_A <- lm(daily_sales ~ avg_review + total_reviews, data = salesofbooks)
summary(m.sales_review_A)
## 
## Call:
## lm(formula = daily_sales ~ avg_review + total_reviews, data = salesofbooks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.396  -14.645   -1.059   13.690  122.429 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   23.870506   2.341271  10.196  < 2e-16 ***
## avg_review    -3.943548   0.513120  -7.685 1.77e-14 ***
## total_reviews  0.543329   0.007823  69.451  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.6 on 5997 degrees of freedom
## Multiple R-squared:  0.4458, Adjusted R-squared:  0.4456 
## F-statistic:  2412 on 2 and 5997 DF,  p-value: < 2.2e-16
cbind(coef(m.sales_review_A), confint(m.sales_review_A))
##                              2.5 %     97.5 %
## (Intercept)   23.870506 19.2807719 28.4602399
## avg_review    -3.943548 -4.9494480 -2.9376473
## total_reviews  0.543329  0.5279926  0.5586653
m.sales_review_B <- lm(daily_sales ~ avg_review * total_reviews, data = salesofbooks)
summary(m.sales_review_B)
## 
## Call:
## lm(formula = daily_sales ~ avg_review * total_reviews, data = salesofbooks)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -104.08  -14.63   -0.92   13.82   92.33 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               63.546900   4.178047  15.210  < 2e-16 ***
## avg_review               -13.683765   0.993159 -13.778  < 2e-16 ***
## total_reviews              0.164754   0.034068   4.836 1.36e-06 ***
## avg_review:total_reviews   0.091688   0.008035  11.411  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.36 on 5996 degrees of freedom
## Multiple R-squared:  0.4576, Adjusted R-squared:  0.4573 
## F-statistic:  1686 on 3 and 5996 DF,  p-value: < 2.2e-16
cbind(coef(m.sales_review_B), confint(m.sales_review_B))
##                                              2.5 %      97.5 %
## (Intercept)               63.54690004  55.35642562  71.7373745
## avg_review               -13.68376484 -15.63071313 -11.7368165
## total_reviews              0.16475390   0.09796872   0.2315391
## avg_review:total_reviews   0.09168842   0.07593650   0.1074403
anova(m.sales_review_A, m.sales_review_B)
## Analysis of Variance Table
## 
## Model 1: daily_sales ~ avg_review + total_reviews
## Model 2: daily_sales ~ avg_review * total_reviews
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5997 3064100                                  
## 2   5996 2998976  1     65125 130.21 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot( data = salesofbooks, aes(x= daily_sales, y= avg_review)) + geom_point() +geom_smooth(method = "lm") #Plotting a scatter plot to know the relationship between sales and average review

ggplot( data = salesofbooks, aes(x= daily_sales, y= total_reviews)) + geom_point() +geom_smooth(method = "lm") #Plotting a scatter plot to know the relationship between sales and total review

ggplot(salesofbooks, aes(x= avg_review, y = total_reviews)) + geom_point() + geom_smooth(method = lm)

m.sales_price_for_genre_A <- lm(daily_sales ~ sale_price + genre, data = salesofbooks)
summary(m.sales_price_for_genre_A)
## 
## Call:
## lm(formula = daily_sales ~ sale_price + genre, data = salesofbooks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.357  -13.311    0.031   13.097  102.924 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       63.8931     1.5195   42.05  < 2e-16 ***
## sale_price        -0.8324     0.1438   -5.79  7.4e-09 ***
## genrefiction      48.6713     0.7562   64.36  < 2e-16 ***
## genrenon_fiction  18.5587     0.7624   24.34  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.18 on 5996 degrees of freedom
## Multiple R-squared:  0.4665, Adjusted R-squared:  0.4662 
## F-statistic:  1748 on 3 and 5996 DF,  p-value: < 2.2e-16
cbind(coef(m.sales_price_for_genre_A), confint(m.sales_price_for_genre_A))
##                                 2.5 %     97.5 %
## (Intercept)      63.8930553 60.914300 66.8718111
## sale_price       -0.8324344 -1.114286 -0.5505827
## genrefiction     48.6713347 47.188823 50.1538467
## genrenon_fiction 18.5587230 17.064212 20.0532344
m.sales_price_for_genre_B <- lm(daily_sales ~ sale_price * genre, data = salesofbooks)
summary(m.sales_price_for_genre_B)
## 
## Call:
## lm(formula = daily_sales ~ sale_price * genre, data = salesofbooks)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -102.38  -13.37    0.03   13.08  102.37 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  72.8781     2.5025  29.122  < 2e-16 ***
## sale_price                   -1.7319     0.2456  -7.053 1.95e-12 ***
## genrefiction                 35.1993     3.2740  10.751  < 2e-16 ***
## genrenon_fiction              6.5492     3.2040   2.044 0.040989 *  
## sale_price:genrefiction       1.4587     0.3546   4.114 3.94e-05 ***
## sale_price:genrenon_fiction   1.2817     0.3469   3.695 0.000222 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.15 on 5994 degrees of freedom
## Multiple R-squared:  0.4683, Adjusted R-squared:  0.4679 
## F-statistic:  1056 on 5 and 5994 DF,  p-value: < 2.2e-16
cbind(coef(m.sales_price_for_genre_B), confint(m.sales_price_for_genre_B))
##                                            2.5 %    97.5 %
## (Intercept)                 72.878117 67.9722348 77.784000
## sale_price                  -1.731864 -2.2132471 -1.250482
## genrefiction                35.199273 28.7810791 41.617467
## genrenon_fiction             6.549246  0.2682356 12.830257
## sale_price:genrefiction      1.458709  0.7636155  2.153802
## sale_price:genrenon_fiction  1.281703  0.6016749  1.961731
anova(m.sales_price_for_genre_A, m.sales_price_for_genre_B)
## Analysis of Variance Table
## 
## Model 1: daily_sales ~ sale_price + genre
## Model 2: daily_sales ~ sale_price * genre
##   Res.Df     RSS Df Sum of Sq      F   Pr(>F)    
## 1   5996 2949642                                 
## 2   5994 2939524  2     10118 10.316 3.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif(m.sales_price_for_genre_A)
##                GVIF Df GVIF^(1/(2*Df))
## sale_price 1.229697  1        1.108917
## genre      1.229697  2        1.053051
vif(m.sales_price_for_genre_B, type = 'predictor')
## GVIFs computed for predictors
##            GVIF Df GVIF^(1/(2*Df)) Interacts With Other Predictors
## sale_price    1  5               1          genre             --  
## genre         1  5               1     sale_price             --
salesbytotalreview <- lm(daily_sales ~ total_reviews, data = salesofbooks)
summary(salesbytotalreview)
## 
## Call:
## lm(formula = daily_sales ~ total_reviews, data = salesofbooks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.202  -14.824   -1.026   13.620  138.424 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.875622   1.077639   7.308 3.06e-13 ***
## total_reviews 0.537048   0.007818  68.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.71 on 5998 degrees of freedom
## Multiple R-squared:  0.4403, Adjusted R-squared:  0.4402 
## F-statistic:  4719 on 1 and 5998 DF,  p-value: < 2.2e-16
(salesofbooks <- salesofbooks %>% mutate(sales.hat=predict(salesbytotalreview)))
## # A tibble: 6,000 × 8
##    sold_by                 publi…¹ genre avg_r…² daily…³ total…⁴ sale_…⁵ sales…⁶
##    <chr>                   <chr>   <fct>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Random House LLC        big fi… chil…    4.44    61.6      92    8.03    57.3
##  2 Amazon Digital Service… indie   non_…    4.19    74.9     130    9.08    77.7
##  3 Amazon Digital Service… small/… non_…    3.71    66.0     118    9.48    71.2
##  4 Amazon Digital Service… small/… fict…    4.72    85.2     179   12.3    104. 
##  5 Simon and Schuster Dig… big fi… chil…    4.65    37.7     111    5.78    67.5
##  6 Simon and Schuster Dig… big fi… chil…    4.81    70.6     106   11.7     64.8
##  7 Amazon Digital Service… small/… fict…    4.33   172.      205   10.3    118. 
##  8 HarperCollins Publishe… big fi… chil…    4.21    59.4      86   11.4     54.1
##  9 Amazon Digital Service… small/… fict…    3.95   134.      161    7.08    94.3
## 10 Amazon Digital Service… small/… chil…    4.66    62.2      81   10.8     51.4
## # … with 5,990 more rows, and abbreviated variable names ¹​publisher_type,
## #   ²​avg_review, ³​daily_sales, ⁴​total_reviews, ⁵​sale_price, ⁶​sales.hat
ggplot(salesofbooks, mapping = aes(y=daily_sales, x=total_reviews, ymin=daily_sales, ymax=sales.hat)) + 
    geom_point() + 
  labs(x="total reviews", y="daily sales", title = "Total Reviews against Daily Sales", subtitle="Vertical lines show the residuals") + 
    geom_smooth(method=lm)

##Plotted for understanding as to what is the relationship between total reviews and daily sales, and we can conclude that there is a positive relationship, as total reviews increase, daily sales will increase as well. 
salesbyavgreview <- lm(daily_sales ~ avg_review, data = salesofbooks)
summary(salesbyavgreview)
## 
## Call:
## lm(formula = daily_sales ~ avg_review, data = salesofbooks)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -79.944 -22.299  -4.837  18.943 128.948 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  80.0517     2.9510  27.127   <2e-16 ***
## avg_review   -0.2208     0.6854  -0.322    0.747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.36 on 5998 degrees of freedom
## Multiple R-squared:  1.729e-05,  Adjusted R-squared:  -0.0001494 
## F-statistic: 0.1037 on 1 and 5998 DF,  p-value: 0.7474
salesbyavgreviewtotalreviews <- lm(daily_sales ~ avg_review + total_reviews, data= salesofbooks)
anova(salesbyavgreviewtotalreviews)
## Analysis of Variance Table
## 
## Response: daily_sales
##                 Df  Sum Sq Mean Sq   F value Pr(>F)    
## avg_review       1      96      96    0.1871 0.6653    
## total_reviews    1 2464464 2464464 4823.4038 <2e-16 ***
## Residuals     5997 3064100     511                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
children <- salesofbooks %>% filter(genre == 'childrens')
fiction <- salesofbooks %>% filter(genre == 'fiction')
nonfiction <- salesofbooks %>% filter(genre == 'non_fiction')
ggplot(data = nonfiction, aes(x = sale_price, y = daily_sales)) + geom_point() + geom_smooth(method = lm) + labs(title = "Daily Sales Against Sale Price")

salesbysalepricegenre <- lm(daily_sales ~ sale_price * genre, data=salesofbooks)
anova(salesbysalepricegenre)
## Analysis of Variance Table
## 
## Response: daily_sales
##                    Df  Sum Sq Mean Sq  F value    Pr(>F)    
## sale_price          1  426054  426054  868.770 < 2.2e-16 ***
## genre               2 2152964 1076482 2195.060 < 2.2e-16 ***
## sale_price:genre    2   10118    5059   10.316  3.37e-05 ***
## Residuals        5994 2939524     490                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Question 2 Section 2:

Part 1. Different genres have different daily sales on average

From the above figure we can clearly state that the genre that is “fiction” it has almost as double sales on average when compares to the genre “children” and we can conclude that different genres have different sales on average.

Part 2. Daily sales increase as average review scores and total reviews increase

The relationship of daily sales and average review can not be properly established here and we can not conclude much so we need to consider the relationship of total reviews and daily sales, and see where we reach and what conclusion is where we get at, and we also see relationship of average reviews and total reviews.

Now looking at these three scatter plots above and after looking at the relationship of daily sales against total reviews and daily sales against average reviews, and it is hard to come to a conclusion so we have plotted average reviews against total reviews as well, and we can see that as average number of reviews and total reviews are increasing, along with it daily sales are also increasing. So we can say that as total number of reviews are increasing, daily sales will increase as well. And we have further proved it with statsitical measure which is correlation, and the value of corelation here is 0.68. As per our model, we can establish that for every increasae of 1 in the total review score, the daily sales will increase bt around 0.63£ and for every increase in the average review about 1, the daily sales will decrease around £0.3.

Part 3. Negative effect of sale price upon the number of sales, different across genre

```

We can say here that as the sale price will increase the daily sales will decrease slightly as well, and we can conclude that there is a negative relationship between sale price and daily sales.